PDX-481: fix(prompts): rewrite author-test flow to single-call construction by mrdailey99 · Pull Request #173 · ProvarTesting/provardx-cli

mrdailey99 · 2026-05-15T13:50:42Z

Summary

Rewrites the provar.guide.orchestration author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section to mandate single-call construction of test cases (full steps[] tree in one provar_testcase_generate call)
Marks provar_testcase_step_edit as amend-only — never for constructing new cases from an empty skeleton
Splits the orchestration prerequisite graph so generate (construct) and step_edit (amend) appear as distinct entry points
Mirrors the construct-vs-amend split in docs/mcp.md and adds pilot-guide Scenario 12 so the regression class is covered in evaluation
Adds a regression-guard unit suite (canonical phrasing + multi-scenario snapshot) and a JSON-RPC pdx-481-validate.cjs for protocol-surface verification

Jira

https://provartesting.atlassian.net/browse/PDX-479 (main bug — root cause confirmed)
https://provartesting.atlassian.net/browse/PDX-480 (validation — confirmed PR PDX-0: feat(mcp): add user-facing guide prompts, tool-guide resource, and git-workflow command #153 is the cause; basic LLM-driven generation against patched binaries produces correct output)
https://provartesting.atlassian.net/browse/PDX-481 (this fix)
https://provartesting.atlassian.net/browse/PDX-482 (related hardening story — tool description contract)

Why

The 1.5.0 regression (PDX-479) traced to two artifacts shipped in PR #153 that both steered LLMs toward a generate-empty + step_edit-per-step authoring pattern. That pattern caused scenarios to be dropped, asserts to be emitted as flat siblings instead of nested under UiWithScreen clauses, and assert API IDs to drift across the case. PDX-480 mechanically + behaviorally confirmed disabling those artifacts restores correct generation. This PR keeps the helpful guidance but rewrites it to the correct pattern.

Test plan

yarn compile clean
node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts" — 1118 passing, 0 failing (12 new tests from this PR)
yarn lint clean
node scripts/pdx-481-validate.cjs — 9 / 9 PASS: author-test flow contains "single call", "ALL steps", "amend"; excludes "repeat per step"; prerequisite graph split confirmed; tool-guide resource serves the rewritten copy
node scripts/mcp-smoke.cjs — not run; environmental (global sf not linked to local worktree plugin) — verified the same blocker affects the develop tip, not introduced by this PR
Reviewer to run an MCP client (Claude Desktop / Cursor) against the local build with a multi-scenario authoring prompt and confirm single-call construction is followed and the resulting XML is structurally clean

Changes

src/mcp/prompts/guidePrompts.ts: rewrote 'author-test' flow + split the generate/step_edit lines in the prerequisite graph
docs/PROVAR_TOOL_GUIDE.md: rewrote "I want to write a new test" section
docs/mcp.md: added construct-pattern callout on provar_testcase_generate; added amend-only callout on provar_testcase_step_edit
docs/mcp-pilot-guide.md: added Scenario 12 "Construct a Multi-Scenario Test Case in a Single Call"
test/unit/mcp/guidePrompts.test.ts (new): registration + canonical-phrasing assertions for the author-test flow and the prerequisite graph
test/unit/mcp/testCaseGenerate.test.ts: added "multi-scenario single-call construction (PDX-481 regression guard)" describe block with 4 tests (12-step payload sequential testItemIds; scenario markers preserved; assert API ID consistency; non-SF target_uri nests correctly)
scripts/pdx-481-validate.cjs (new): JSON-RPC validation harness covering 9 protocol-surface assertions

Out of scope

testCaseGenerate.ts / testCaseStepTools.ts / testCaseValidate.ts are unchanged — they were byte-identical to the known-good 1.5.0-beta.17 build
Tool description hardening (PDX-482) lands separately as defence-in-depth

Customer-facing docs

docs/provar-mcp-public-docs.md and docs/university-of-provar-mcp-course.md are maintained separately by the Provar team and may need a mirror update if they describe the authoring sequence.

🤖 Generated with Claude Code

…uction RCA: PR #153 (PDX-479 regression) shipped two artifacts — the provar.guide.orchestration prompt's author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section — that both steered LLMs toward generate-empty-then-step_edit-per-step authoring. PDX-480 confirmed locally that disabling these restores correct generation. This patch keeps the helpful guidance but rewrites it to recommend a single provar_testcase_generate call carrying the full step tree; step_edit is now explicitly marked amend-only. Fix: Rewrote the author-test flow in src/mcp/prompts/guidePrompts.ts and the matching section in docs/PROVAR_TOOL_GUIDE.md to mandate single-call construction. Split the orchestration prerequisite graph so provar_testcase_generate and provar_testcase_step_edit appear as distinct entry points with construct-vs-amend annotations. Added construct-vs-amend callouts to docs/mcp.md tool sections. Added pilot-guide Scenario 12 covering the multi-scenario single-call expectation. Added unit tests asserting the canonical phrasing (and absence of "repeat per step") plus a multi-scenario snapshot test that drives 12 steps through provar_testcase_generate in one call and asserts consecutive testItemIds, preserved scenario markers, consistent assert API IDs, and correct UiWithScreen nesting. Added scripts/pdx-481-validate.cjs for direct JSON-RPC verification of the fix at the protocol surface.

github-actions · 2026-05-15T13:51:11Z

Quality Orchestrator

🟢 LOW · 2 / 100 · All changed files have mapped tests.

🧪 Tests to Run · Running 1 of 51 tests

unit/mcp/guidePrompts.test.ts

▶ Run command

npx vitest run \
  unit/mcp/guidePrompts.test.ts

_{⚡ quality-orchestrator · /qo stub <file> · qo analyze-local}

Copilot

Pull request overview

Rewrites the MCP author-test orchestration guidance and related documentation to mandate single-call test case construction via provar_testcase_generate (passing the full steps[] tree at once), and marks provar_testcase_step_edit as amend-only. This addresses the PDX-479 regression where step-by-step authoring caused dropped scenarios, flat asserts, and inconsistent step types. No production logic in testCaseGenerate.ts is modified — the change is to guidance copy plus regression-guard tests and a JSON-RPC validation script.

Changes:

Rewrite author-test flow in guidePrompts.ts and PROVAR_TOOL_GUIDE.md; split generate vs step_edit in the prerequisite graph and docs/mcp.md callouts; add Scenario 12 to the pilot guide.
Add new test suites: guidePrompts.test.ts (registration + canonical phrasing) and a multi-scenario regression block in testCaseGenerate.test.ts.
Add scripts/pdx-481-validate.cjs JSON-RPC harness to verify the rewritten copy and prerequisite-graph split via the MCP protocol surface.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/mcp/prompts/guidePrompts.ts	Rewrites the `author-test` flow steps and splits `generate`/`step_edit` entries in the prerequisite graph.
docs/PROVAR_TOOL_GUIDE.md	Rewrites "I want to write a new test" section with single-call construction guidance and amend-only step_edit usage.
docs/mcp.md	Adds construct-pattern callout to `provar_testcase_generate` and amend-only callout to `provar_testcase_step_edit`.
docs/mcp-pilot-guide.md	Adds Scenario 12 covering multi-scenario single-call construction for pilot evaluation.
test/unit/mcp/guidePrompts.test.ts	New unit suite verifying prompt registration and canonical author-test phrasing.
test/unit/mcp/testCaseGenerate.test.ts	Adds a "multi-scenario single-call construction" describe block with 4 regression-guard tests.
scripts/pdx-481-validate.cjs	New JSON-RPC validation harness asserting the rewritten copy is reachable and contains canonical phrasing.

Comments suppressed due to low confidence (1)

test/unit/mcp/guidePrompts.test.ts:136

The regex relies on default | precedence which makes it provar_testcase_generate[^\n]*construct OR construct[^\n]*provar_testcase_generate. While this works, the unbounded [^\n]* can match across many tokens within the same line, so this could pass even if the words happen to appear on the same line in an unrelated context (e.g. another tool name between them). Consider tightening the regex (e.g. a bounded character window or matching specific punctuation like →/() to ensure the test actually verifies that provar_testcase_generate is annotated as the construct entry point.

    assert.ok(
      /provar_testcase_generate[^\n]*construct|construct[^\n]*provar_testcase_generate/i.test(text),
      'prerequisite graph must annotate provar_testcase_generate as the construct entry point'
    );
    assert.ok(
      /provar_testcase_step_edit[^\n]*amend|amend[^\n]*provar_testcase_step_edit/i.test(text),
      'prerequisite graph must annotate provar_testcase_step_edit as the amend entry point'
    );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mrdailey99 · 2026-05-15T14:15:06Z

+      !text.includes('repeat as needed') || text.includes('amend'),
+      'author-test flow must not say "repeat as needed" without also clarifying step_edit is for amendments only'


Fixed in 706cb9f. The OR clause !text.includes("repeat as needed") || text.includes("amend") short-circuited to true on every run because "amend" appears throughout the rewritten author-test copy. Replaced with the unconditional !text.includes("repeat as needed") — the amend-only contract is already covered by the dedicated "author-test flow marks step_edit as amendment-only" test on line 113.

mrdailey99 · 2026-05-15T14:15:11Z

+  let pass = 0;
+  let fail = 0;
+  for (const r of results) {
+    console.log(`${r.ok ? '[PASS]' : '[FAIL]'} ${r.label} — ${r.detail}`);
+    r.ok ? pass++ : fail++;
+  }
+  console.log(`\nPDX-481 validation: ${pass} passed, ${fail} failed`);
+
+  server.stdin.end();
+  process.exit(fail > 0 ? 1 : 0);
+})().catch((err) => {
+  console.error('Validation script error:', err);
+  server.kill();
+  process.exit(2);


Partially addressed in 706cb9f. Replaced the ternary statement r.ok ? pass++ : fail++; with an explicit if/else block. On the cleanup concern: server.kill() is already called in the failure path (the script ends with .catch((err) => { console.error(...); server.kill(); process.exit(2); }) — see lines 152-156). That kills the child process before exit on rpc timeout / aborted handshake / any thrown error. Going further (synchronous wait for exit event before process.exit) would add complexity for a script that always exits immediately afterward and is only invoked as a one-shot validator, so leaving cleanup as-is.

RCA: Copilot review flagged two real defects in the PDX-481 regression-guard test and validation harness: (1) the "repeat as needed" assertion in guidePrompts.test.ts had an OR-clause that short-circuited to true because "amend" appears repeatedly elsewhere in the flow, making the assertion a no-op; (2) the prerequisite-graph regex used unbounded [^\n]* so it could match unrelated tokens between the two words on the same line; and one style nit on validate.cjs using a ternary as a statement. Fix: Made the "repeat as needed" assertion unconditional so it actually protects against the anti-pattern phrasing being reintroduced. Tightened the prerequisite-graph regex to require the exact annotation punctuation (provar_testcase_generate\s*\(construct / provar_testcase_step_edit\s*\(amend) so it cannot pass on unrelated text. Replaced the ternary-as-statement counter in pdx-481-validate.cjs with an if/else block. Also added scripts/pdx-481-trace.cjs (the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence). All gate checks pass: 1118 mocha tests, lint clean, validation 9/9.

mrdailey99 · 2026-05-15T14:15:36Z

Copilot review follow-ups — addressed in `706cb9f`

#	File	Copilot finding	Action
1	`test/unit/mcp/guidePrompts.test.ts:106`	`!text.includes('repeat as needed') \|\| text.includes('amend')` short-circuits to true ("amend" appears throughout the copy), making the assertion a no-op	✅ Fixed — unconditional `!text.includes('repeat as needed')`. The amend-only contract is still covered by the dedicated test on line 113
2	`scripts/pdx-481-validate.cjs:152`	`r.ok ? pass++ : fail++;` ternary-as-statement	✅ Fixed — explicit `if (r.ok) pass++; else fail++;` block
2 (cont.)	same	Server cleanup on timeout/abort	❌ Already handled — the `.catch` on the IIFE (lines 152-156) calls `server.kill()` before `process.exit(2)`. Going further to wait for `exit` event would over-engineer a one-shot validator
3 (suppressed-low-confidence)	`test/unit/mcp/guidePrompts.test.ts:130, 134`	Unbounded `[^\n]*` regex could match unrelated tokens between the two words on the same line	✅ Fixed — tightened to `provar_testcase_generate\s\(construct` and `provar_testcase_step_edit\s\(amend`, anchoring on the actual `(annotation)` punctuation emitted by the prerequisite-graph copy

Gate after fixes

yarn compile → clean
node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts" → 1118 passing, 0 failing
yarn lint → clean
node scripts/pdx-481-validate.cjs → 9 / 9 PASS

Bonus inclusion

The follow-up commit also adds scripts/pdx-481-trace.cjs — the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence. Useful for any reviewer who wants to re-verify locally:

node scripts/pdx-481-trace.cjs

Drives the patched server and dumps the exact bytes an MCP client surfaces to its LLM for: the orchestration prompt's author-test flow (TRACE 1), the tool-guide resource (TRACE 2), both tool descriptions (TRACEs 3 + 4), and a real multi-scenario provar_testcase_generate call (TRACE 5).

Copilot AI review requested due to automatic review settings May 15, 2026 13:50

Copilot started reviewing on behalf of mrdailey99 May 15, 2026 13:51 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

mrdailey99 merged commit 5276f5e into develop May 15, 2026
4 checks passed

mrdailey99 mentioned this pull request May 15, 2026

PDX-482: feat(mcp): harden testcase tool descriptions for single-call contract #174

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDX-481: fix(prompts): rewrite author-test flow to single-call construction#173

PDX-481: fix(prompts): rewrite author-test flow to single-call construction#173
mrdailey99 merged 2 commits into
developfrom
fix/PDX-481-rewrite-author-test-guidance

mrdailey99 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

mrdailey99 May 15, 2026

Uh oh!

mrdailey99 May 15, 2026

Uh oh!

mrdailey99 commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		!text.includes('repeat as needed') \|\| text.includes('amend'),
		'author-test flow must not say "repeat as needed" without also clarifying step_edit is for amendments only'

Conversation

mrdailey99 commented May 15, 2026

Summary

Jira

Why

Test plan

Changes

Out of scope

Customer-facing docs

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quality Orchestrator

🧪 Tests to Run · Running 1 of 51 tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mrdailey99 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

mrdailey99 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

mrdailey99 commented May 15, 2026

Copilot review follow-ups — addressed in 706cb9f

Gate after fixes

Bonus inclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 15, 2026 •

edited

Loading

Copilot review follow-ups — addressed in `706cb9f`